Text Analysis
for
Clinical Research

Our current experiences at UBEP

ITALY/UniPD/DSCTVPH/UBEP/LAIMS/CorradoLanera.phd

Modern Tools for Text Analysis

Neural Network

RNN

Transformer

Modern Applications for Text Analysis

(L)LM

Modern Opportunities for Text Analysis

Modern UBEP on Text Analysis

Project Data Records Tool Test
Extending SRs citations 7494 (14 SR) RF/SVM 0.934-0.999
(AUC-ROC)
VZV detection Pediatrician
free-text notes
60659 RNN 0.953
(AUC-ROC)
Otitis Classification Pediatrician
free-text notes
297673 RNN 0.955
(Balance F1)
Injuries Classification ED discharge
free-text notes
8194 GPT-4 99.5-1.0
(Accuracy)
SR Screening citations 3080 Humans + ASReview 0.96-0.98
(AUC-ROC)
SR Screening citations 24931 GPT-4o fine-tuning
SR Screening citations 535 GPT-4o developing
Classification,
Extraction & Matching
citations &
registries records
594587 GPT-4o running

Modern Comparisons of Text Analysis1

Model Type Access Computation Accuracy
(n = 320)
GPT-4o pretrained payed remote TBD
GPT-3.5-turbo pretrained payed remote TBD
LLama3 70B pretrained free local 98.6 (97.6 balanced)
LLama3 8B pretrained free local 92.2 (90.8 balanced)
Gemma 1.1 7B pretrained free local 79.1 (75.7 balanced)
RF fitted self-developed local 77.6
GBM fitted self-developed local 62.8
NB fitted self-developed local 60.6
SVM fitted self-developed local 57.8

Modern Reality for Text Analysis

References

Lanera, C., P. Berchialla, A. Sharma, C. Minto, D. Gregori, and I. Baldi. 2019. “Screening PubMed Abstracts: Is Class Imbalance Always a Challenge to Machine Learning?” Systematic Reviews 8 (1). https://doi.org/10.1186/s13643-019-1245-8.
Lanera, Corrado, Ileana Baldi, Andrea Francavilla, Elisa Barbieri, Lara Tramontan, Antonio Scamarcia, Luigi Cantarutti, Carlo Giaquinto, and Dario Gregori. 2022. “A Deep Learning Approach to Estimate the Incidence of Infectious Disease Cases for Routinely Collected Ambulatory Records: The Example of Varicella-Zoster.” International Journal of Environmental Research and Public Health 19 (10): 5959. https://doi.org/10.3390/ijerph19105959.
Lanera, Corrado, Giulia Lorenzoni, Elisa Barbieri, Gianluca Piras, Arjun Magge, Davy Weissenbacher, Daniele Donà, et al. 2024. “Monitoring the Epidemiology of Otitis Using Free-Text Pediatric Medical Notes: A Deep Learning Approach.” Journal of Personalized Medicine 14 (1): 28. https://doi.org/10.3390/jpm14010028.
Lanera, Corrado, Clara Minto, Abhinav Sharma, Dario Gregori, Paola Berchialla, and Ileana Baldi. 2018. “Extending PubMed Searches to ClinicalTrials.gov Through a Machine Learning Approach for Systematic Reviews.” Journal of Clinical Epidemiology 103 (November): 22–30. https://doi.org/10.1016/j.jclinepi.2018.06.015.
Lorenzoni, Giulia, Dario Gregori, Silvia Bressan, Honoria Ocagli, Danila Azzolina, Liviana Da Dalt, and Paola Berchialla. 2024. “Use of a Large Language Model to Identify and Classify Injuries With Free-Text Emergency Department Data.” JAMA Network Open 7 (5): e2413208. https://doi.org/10.1001/jamanetworkopen.2024.13208.
OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, et al. 2024. GPT-4 Technical Report.” arXiv. https://doi.org/10.48550/arXiv.2303.08774.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. “Attention Is All You Need.” arXiv. https://doi.org/10.48550/arXiv.1706.03762.

Thank you for the attention